DIAgui: a Shiny application to process the output from DIA-NN

Abstract Summary DIAgui is an R package to simplify the processing of the report file from the DIA-NN software thanks to a Shiny application. It returns the quantification of either the precursors, the peptides, the proteins, or the genes thanks to the MaxLFQ algorithm. In addition, the latest version provides the Top3 and iBAQ quantification and the number of peptides used for the quantification. In the end, DIAgui produces ready-to-interpret files from the results of DIA mass spectrometry analysis and provides visualization and statistical tools that can be used in a user-friendly way. Availability and implementation Code and documentation are available on GitHub at https://github.com/marseille-proteomique/DIAgui. The package is written in R and also uses C++ code. A vignette shows its use in an R command line workflow.


Introduction
Data-independent acquisition (DIA) proteomics is a recently developed global proteomics strategy based on mass spectrometry (MS).It is increasingly used in various proteomics studies as it offers broad protein coverage, high reproducibility, and accuracy (Hu et al. 2016, Krasny andHuang 2021).In a DIA acquisition, precursor ions are isolated within pre-defined isolation windows and then fragmented together, in contrast to conventional data-dependent acquisition (DDA) where they are isolated for a specific m/z.All fragmented ions in each window are then analyzed by a high-resolution mass spectrometer.The analysis of these much more complex MS2 spectra is now made easy thanks to artificial intelligence.Many software tools are now available, such as Spectronaut, DIA-NN, or Skyline.In a recent benchmark (Gotti et al. 2021), DIA-NN was found to be one of the most efficient.As it is also free, DIA-NN is increasingly used in the proteomics community.However, to fully use DIA-NN, its author strongly advises filtering the DIA-NN output using R and then applying the MaxLFQ algorithm to obtain a better quantification (Demichev et al. 2020).Furthermore, DIA-NN does not offer absolute quantification like the Top3 (Grossmann et al. 2010) or iBAQ (Br€ onstrup 2004) methods as in the state-of-the-art DDA software MaxQuant for example.In order to automate post-processing with R and give the possibility to compute absolute quantification from DIA-NN output, we propose DIAgui an R package (based on diann R package) that provides a user-friendly interface.Thanks to DIAgui, users can process DIA-NN results without knowing R and obtain more complete and interpretable results.

Description
DIAgui is applicable to all types of proteomic experiments as long as the data were processed through DIA-NN.Only the 'report.tsv'file is required.DIAgui contains two main functions: report_process which analyzes the report file with a single R command and runDIAgui which launches the Shiny application to process the report file in an interactive way.

The Shiny application
The DIAgui app is divided into four main tabs.The first one allows the user to import the report file (usually named 'report.tsv')resulting from DIA-NN.When the data are loaded, the user can change the name of the LC-MS files which are by default the path of the raw files used.The next two tabs allow the extraction of datasets at different levels: precursors, peptides, proteins group, or genes.For each extraction, the user can filter precursors based on q-values, keep only proteotypic peptides and remove modified peptides.Each extraction can be downloaded in txt, CSV or Excel format.To extract the protein group dataset, the MaxLFQ algorithm (Cox et al. 2014) will be used with either the method from diann R package (https://github.com/vdemichev/diann-rpackage) (Demichev et al. 2020) or the iq R package method (Pham et al. 2020).The iq package code is much faster, but the results are totally equivalent.DIAgui can also calculate Top3 (Grossmann et al. 2010) and iBAQ (Br€ onstrup 2004) quantification.For iBAQ calculation, either the FASTA file used during the DIA-NN step must be loaded or the seqinr R package (Charif and Lobry 2007) must be used to query the SwissProt database (SwissProt Database 2008).Using a FASTA file is much faster because DIAgui performs no query to SwissProt.For precursors, peptides, or genes quantification, DIAgui is based on the diann_matrix function of the diann R package, meaning that the MaxLFQ algorithm is not used.In DIAgui, we improved this function by adding the Top3 quantification with the option to take the sum or the max of the intensities of the same ID.Taking the sum of these intensities is useful to obtain Top3 and iBAQ absolute quantifications, but it should be noted that this calculation uses the unnormalized raw intensity.Moreover, DIAgui reports the number of peptides used for the quantification, which is important to assess the quality of the quantification.Important additions to the quantification are summarized in the supplementary table.The last tab of DIAgui allows exploring the extracted dataset directly and easily.The user can visualize the data using an interactive heatmap, a density plot, correlation plot, and a MDS plot, or assess the proportion of non-missing values and much more.The user can also perform missing value imputation and statistical comparison between chosen groups.This exploration is offered for data filtered using DIAgui or initially uploaded data.
Finally, DIAgui can also help the user to study the impact of the m/z window selection of their DIA analysis.Using the get_bestwind function or the 'Check other reports' tab under the 'Data visualization and statistics' tab from the app, the user can see what the distribution of the precursors would be as a function of a fixed number of windows or a fixed window size, based on the report-Lib file output from DIA-NN.In addition, the user can allow a fixed overlap between windows.This feature could facilitate the development of DIA methods with variable windows.
All the features of DIAgui are presented in a video tutorial accessible by clicking the question mark icon or via https:// youtu.be/vfvh15Q93eU.

Conclusion
DIAgui is freely available at https://github.com/marseille-proteomique/DIAgui.The user only needs to install R and follow the R package installation procedure.No R code skills are required.As DIA-NN is increasingly used in quantitative proteomics based on DIA acquisition, our user-friendly interface will allow the proteomics community to debride and accelerate the processing of their DIA analyses and research.

Figure 1 .
Figure 1.Workflow of DIAgui.After data independent analysis mass spectrometry (DIA-MS) of any protein sample, the raw data are analyzed with the software DIA-NN.Then using our R package DIAgui, the user can analyze the output from DIA-NN in a user-friendly way using our Shiny application accessible through DIAgui.